Detection of Outliers in Time Series Data

نویسندگان

  • Samson Sifael Kiware
  • Praveen Madiraju
چکیده

DETECTION OF OUTLIERS IN TIME SERIES DATA Samson Kiware, B.A. Marquette University, 2010 This thesis presents the detection of time series outliers. The data set used in this work is provided by the GasDay Project at Marquette University, which produces mathematical models to predict the consumption of natural gas for Local Distribution Companies (LDCs). Flow with no outliers is required to develop and train accurate models. GasDay is using statistical approaches motivated by normally distributed samples such as the 3 − σ rule and the 5 − σ rule to aid the experts in detecting outliers in residuals from the models. However, the Jarque-Bera statistical test shows that the residuals from the GasDay models are not normally distributed. We present an explanation of Density Based Spatial Clustering of Applications with Noise (DBSCAN) and how it is used to detect time series outliers. We have introduced a new application for the DBSCAN algorithm by adapting it to detect outliers in natural gas flow. The performance of DBSCAN is compared with GasDay’s existing technique. Five data sets from temperature-sensitive operating areas with identified outliers and 1000 data sets with synthetic outliers are used in the evaluation process. The 1000 synthetic data sets are prepared using the same empirical distribution as one of the identified data set. This work indicates that DBSCAN has shown some improvement in detecting outliers over GasDays existing technique and merits further exploration.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Identification of outliers types in multivariate time series using genetic algorithm

Multivariate time series data, often, modeled using vector autoregressive moving average (VARMA) model. But presence of outliers can violates the stationary assumption and may lead to wrong modeling, biased estimation of parameters and inaccurate prediction. Thus, detection of these points and how to deal properly with them, especially in relation to modeling and parameter estimation of VARMA m...

متن کامل

Control chart based on residues: Is a good methodology to detect outliers?

The purpose of this article is to evaluate the application of forecasting models along with the use of residual control charts to assess production processes whose samples have autocorrelation characteristics. The main objective is to determine the efficiency of control charts for individual observations (CCIO) and exponentially weighted moving average (EWMA) charts when they are applied to res...

متن کامل

On the Detection of Trends in Time Series of Functional Data

A sequence of functions (curves) collected over time is called a functional time series. Functional time series analysis is one of the popular research areas in which statistics from such data are frequently observed. The main purpose of the functional time series is to predict and describe random mechanisms that resulted in generating the data. To do so, it is needed to decompose functional ti...

متن کامل

New optimized model identification in time series model and its difficulties

Model identification is an important and complicated step within the autoregressive integrated moving average (ARIMA) methodology framework. This step is especially difficult for integrated series. In this article first investigate Box-Jenkins methodology and its faults in detecting model, and hence have discussed the problem of outliers in time series. By using this optimization method, we wil...

متن کامل

Introduction Package CircOutlier For Detection of Outliers in Circular-Circular Regression

One of the most important problem in any statistical analysis is the existence of unexpected observations. Some observations are not a part of the study and are known as outliers. Studies have shown that the outliers affect to the performance of statistical standard methods in models and predictions. The point of this work is to provide a couple of statistical package in R software to identi...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016